Overview

Dataset statistics

Number of variables9
Number of observations17000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory72.0 B

Variable types

Numeric9

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation

Reproduction

Analysis started2021-01-27 11:46:21.655030
Analysis finished2021-01-27 11:46:39.560021
Duration17.9 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct827
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5621082
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.79
median-118.49
Q3-118
95-th percentile-117.07
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.005166408
Coefficient of variation (CV)-0.0167709188
Kurtosis-1.322329668
Mean-119.5621082
Median Absolute Deviation (MAD)1.28
Skewness-0.3040029768
Sum-2032555.84
Variance4.020692325
MonotocityDecreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31136
 
0.8%
-118.3128
 
0.8%
-118.32124
 
0.7%
-118.29118
 
0.7%
-118.35116
 
0.7%
-118.36115
 
0.7%
-118.27114
 
0.7%
-118.28113
 
0.7%
-118.37111
 
0.7%
-118.19110
 
0.6%
Other values (817)15815
93.0%
ValueCountFrequency (%)
-124.351
< 0.1%
-124.32
< 0.1%
-124.271
< 0.1%
-124.261
< 0.1%
-124.251
< 0.1%
ValueCountFrequency (%)
-114.311
< 0.1%
-114.471
< 0.1%
-114.561
< 0.1%
-114.572
< 0.1%
-114.582
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct840
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.62522471
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.25
Q337.72
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.137339795
Coefficient of variation (CV)0.05999512459
Kurtosis-1.112226493
Mean35.62522471
Median Absolute Deviation (MAD)1.2
Skewness0.4718011204
Sum605628.82
Variance4.568221398
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06205
 
1.2%
34.08200
 
1.2%
34.05196
 
1.2%
34.07194
 
1.1%
34.04188
 
1.1%
34.09178
 
1.0%
34.1171
 
1.0%
34.02169
 
1.0%
34.03162
 
1.0%
33.94146
 
0.9%
Other values (830)15191
89.4%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.569
0.1%
32.5713
0.1%
32.5820
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.821
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.58935294
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58693698
Coefficient of variation (CV)0.4402665918
Kurtosis-0.8008262247
Mean28.58935294
Median Absolute Deviation (MAD)10
Skewness0.06489403293
Sum486019
Variance158.4309826
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521052
 
6.2%
36715
 
4.2%
35692
 
4.1%
16635
 
3.7%
17576
 
3.4%
34567
 
3.3%
33513
 
3.0%
26503
 
3.0%
18478
 
2.8%
25461
 
2.7%
Other values (42)10808
63.6%
ValueCountFrequency (%)
12
 
< 0.1%
249
 
0.3%
346
 
0.3%
4161
0.9%
5199
1.2%
ValueCountFrequency (%)
521052
6.2%
5132
 
0.2%
50112
 
0.7%
49111
 
0.7%
48135
 
0.8%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct5533
Distinct (%)32.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2643.664412
Minimum2
Maximum37937
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum2
5-th percentile626.95
Q11462
median2127
Q33151.25
95-th percentile6269.05
Maximum37937
Range37935
Interquartile range (IQR)1689.25

Descriptive statistics

Standard deviation2179.947071
Coefficient of variation (CV)0.8245929634
Kurtosis29.51588478
Mean2643.664412
Median Absolute Deviation (MAD)792
Skewness4.002729999
Sum44942295
Variance4752169.234
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
158216
 
0.1%
152715
 
0.1%
171714
 
0.1%
147114
 
0.1%
170314
 
0.1%
161313
 
0.1%
205313
 
0.1%
172413
 
0.1%
187512
 
0.1%
201712
 
0.1%
Other values (5523)16864
99.2%
ValueCountFrequency (%)
21
< 0.1%
81
< 0.1%
111
< 0.1%
121
< 0.1%
152
< 0.1%
ValueCountFrequency (%)
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304051
< 0.1%
304011
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1848
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean539.4108235
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum1
5-th percentile138
Q1297
median434
Q3648.25
95-th percentile1283
Maximum6445
Range6444
Interquartile range (IQR)351.25

Descriptive statistics

Standard deviation421.4994516
Coefficient of variation (CV)0.7814071079
Kurtosis19.69275009
Mean539.4108235
Median Absolute Deviation (MAD)162
Skewness3.322636716
Sum9169984
Variance177661.7877
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28048
 
0.3%
30944
 
0.3%
39443
 
0.3%
33143
 
0.3%
34543
 
0.3%
34343
 
0.3%
34041
 
0.2%
29041
 
0.2%
32241
 
0.2%
27241
 
0.2%
Other values (1838)16572
97.5%
ValueCountFrequency (%)
11
 
< 0.1%
21
 
< 0.1%
34
< 0.1%
46
< 0.1%
54
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
54711
< 0.1%
52901
< 0.1%
49571
< 0.1%
49521
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3683
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1429.573941
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum3
5-th percentile350.95
Q1790
median1167
Q31721
95-th percentile3297.05
Maximum35682
Range35679
Interquartile range (IQR)931

Descriptive statistics

Standard deviation1147.852959
Coefficient of variation (CV)0.8029336057
Kurtosis80.86199702
Mean1429.573941
Median Absolute Deviation (MAD)437.5
Skewness5.187211878
Sum24302757
Variance1317566.416
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89123
 
0.1%
105222
 
0.1%
122719
 
0.1%
92619
 
0.1%
76119
 
0.1%
81019
 
0.1%
85019
 
0.1%
73518
 
0.1%
105618
 
0.1%
78118
 
0.1%
Other values (3673)16806
98.9%
ValueCountFrequency (%)
31
< 0.1%
61
< 0.1%
82
< 0.1%
92
< 0.1%
111
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1740
Distinct (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean501.2219412
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum1
5-th percentile126
Q1282
median409
Q3605.25
95-th percentile1172.1
Maximum6082
Range6081
Interquartile range (IQR)323.25

Descriptive statistics

Standard deviation384.5208409
Coefficient of variation (CV)0.7671668163
Kurtosis20.69264455
Mean501.2219412
Median Absolute Deviation (MAD)150
Skewness3.342668363
Sum8520773
Variance147856.2771
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30648
 
0.3%
38648
 
0.3%
28247
 
0.3%
33046
 
0.3%
42645
 
0.3%
38044
 
0.3%
33544
 
0.3%
28443
 
0.3%
31643
 
0.3%
32943
 
0.3%
Other values (1730)16549
97.3%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
32
 
< 0.1%
44
< 0.1%
57
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
51891
< 0.1%
50501
< 0.1%
47691
< 0.1%
46161
< 0.1%

median_income
Real number (ℝ≥0)

Distinct11175
Distinct (%)65.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8835781
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum0.4999
5-th percentile1.603395
Q12.566375
median3.5446
Q34.767
95-th percentile7.36447
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.200625

Descriptive statistics

Standard deviation1.908156518
Coefficient of variation (CV)0.4913398081
Kurtosis4.76414493
Mean3.8835781
Median Absolute Deviation (MAD)1.07405
Skewness1.626693098
Sum66020.8277
Variance3.641061299
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12541
 
0.2%
4.12539
 
0.2%
2.87539
 
0.2%
15.000138
 
0.2%
2.62536
 
0.2%
3.87533
 
0.2%
3.62531
 
0.2%
331
 
0.2%
4.37530
 
0.2%
3.37528
 
0.2%
Other values (11165)16654
98.0%
ValueCountFrequency (%)
0.499911
0.1%
0.5367
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
ValueCountFrequency (%)
15.000138
0.2%
151
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%

median_house_value
Real number (ℝ≥0)

Distinct3694
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207300.9124
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Memory size132.9 KiB

Quantile statistics

Minimum14999
5-th percentile66000
Q1119400
median180400
Q3265000
95-th percentile495500
Maximum500001
Range485002
Interquartile range (IQR)145600

Descriptive statistics

Standard deviation115983.7644
Coefficient of variation (CV)0.5594947126
Kurtosis0.3039975986
Mean207300.9124
Median Absolute Deviation (MAD)68800
Skewness0.9730366335
Sum3524115510
Variance1.34522336 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001814
 
4.8%
13750095
 
0.6%
16250089
 
0.5%
11250085
 
0.5%
18750074
 
0.4%
22500073
 
0.4%
8750064
 
0.4%
35000064
 
0.4%
15000052
 
0.3%
6750051
 
0.3%
Other values (3684)15539
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225003
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
ValueCountFrequency (%)
500001814
4.8%
50000022
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%

Interactions

Correlations

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
5-114.5833.6329.01387.0236.0671.0239.03.343874000.0
6-114.5833.6125.02907.0680.01841.0633.02.676882400.0
7-114.5934.8341.0812.0168.0375.0158.01.708348500.0
8-114.5933.6134.04789.01175.03134.01056.02.178258400.0
9-114.6034.8346.01497.0309.0787.0271.02.190848100.0

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
16990-124.2241.7328.03003.0699.01530.0653.01.703878300.0
16991-124.2341.7511.03159.0616.01343.0479.02.480573200.0
16992-124.2340.8152.01112.0209.0544.0172.03.346250800.0
16993-124.2340.5452.02694.0453.01152.0435.03.0806106700.0
16994-124.2540.2832.01430.0419.0434.0187.01.941776100.0
16995-124.2640.5852.02217.0394.0907.0369.02.3571111400.0
16996-124.2740.6936.02349.0528.01194.0465.02.517979000.0
16997-124.3041.8417.02677.0531.01244.0456.03.0313103600.0
16998-124.3041.8019.02672.0552.01298.0478.01.979785800.0
16999-124.3540.5452.01820.0300.0806.0270.03.014794600.0